Learning to rank has recently emerged as an attractive technique to traindeep convolutional neural networks for various computer vision tasks. Pairwiseranking, in particular, has been successful in multi-label imageclassification, achieving state-of-the-art results on various benchmarks.However, most existing approaches use the hinge loss to train their models,which is non-smooth and thus is difficult to optimize especially with deepnetworks. Furthermore, they employ simple heuristics, such as top-k orthresholding, to determine which labels to include in the output from a rankedlist of labels, which limits their use in the real-world setting. In this work,we propose two techniques to improve pairwise ranking based multi-label imageclassification: (1) we propose a novel loss function for pairwise ranking,which is smooth everywhere and thus is easier to optimize; and (2) weincorporate a label decision module into the model, estimating the optimalconfidence thresholds for each visual concept. We provide theoretical analysesof our loss function in the Bayes consistency and risk minimization framework,and show its benefit over existing pairwise ranking formulations. Wedemonstrate the effectiveness of our approach on three large-scale datasets,VOC2007, NUS-WIDE and MS-COCO, achieving the best reported results in theliterature.
展开▼